1. Introduction

This is a notebook for a past kaggle competition HuBMAP - Hacking the Human Vasculature. The goal of this competition is to detect Blood Vessels from images of kidney tissue taken by microscope, and detecton mask shall have IoU (Intersection over Union) greater than 0.6. The kaggle score is calculated by Average Precision Over Confidence which is the same as Open Images 2019 - Instance Segmentation.

This HuBMAP competition seems to be difficult for few reasons. First, it looks almost impossible to correctly identify blood vessels by non-experts(refer to EDA). It would be hard for deep learning too. Second, only 1633 label data are given for images which is insufficient for such a complex segemtation. Another reason is imbalanced label: only 3.3% of image pixels are positive. Therefore, I defined following targets for this project.

Target

2. Data

There are 1633 training images with their label (polygon), tile information (tile_df) which indicate datasource and source wsi. Source wsi is profiles of human subjects described inwsi_df.

Data Source

https://www.kaggle.com/competitions/hubmap-hacking-the-human-vasculature/data

Reading Images

This code read images from folder.

3. Data Cleaning

Quality fo image is good and unnecessary to clean. The problem is that the label is given as polygon (geometry) data. For image segmentation, it shall be converted to pixelwise label. This operation is accomplished by other notebook. Put simply, it checks whether pixels are in polygon or not by breath-first-seach. BFS (starts from one of polygon element) significantly reduced calculation time compared with checking all the elements.

Following figure shows one of given image, its corresponding label, and pixel wise label.

4. EDA

4.1 Basic Information

Images and labels are provided from two datasets and four sources (persons). Number of data is not equal between them. Kaggle test data is taken from only dataset1, but they are invisible for participants.

4.2 Images and Labels

Next, images and labels are visualized by each dataset and source (person). Size and quantity of blood vessels are different between sources.

Dataset1

In the dataset 1, source 1 contains larger size but small number of blood vessels, whereas source 2 has many smaller size blood vessels.

Dataset 2

Image and labels of dataset 2 looks similar to dataset 1, but their color is slightly different.

4.3 Data Imbalance

Since it was hard to clearly explain difference of distribution, following items are calculated by each image. Then distribution are plotted.

The function label_stat calculates number of blood vessel and mean size of them. OpenCV's function cv2.connectedComponents takes boolean image as input, and count number of connected area, then it assigns label into each of them in 2D matrix. densty_plot and scatter_plot visualize ratio of positive pixels, mean size and quantity of blood vessels respectively.

Ratio of positive pixel

Overall, only 3.3% of pixels are positive (blood vessel). When it is calculated by each dataset and source, dataset 1 has more positive pixels than dataset 2.

The ratio positive pixel is calculated by each image, and the distribution is plotted in following density plots.

Dataset 1

Dataset 2

Size and quantity of Blood Vessels

Dataset 1 source 1 has different distribution from others. It has less number of blood vessels, but their size are larger. Please note that one dot corresponds to one image in scatterplots.

dataset1

dataset2

5. Models

For image segmentation, Fully Convolutional Network and U-net are developed. Both of them are trained, then validation results are compared.

5.1 FCN (Fully Convolutional Network)

Fully Convolutional Network (FCN) has two type of elements. First,ConvBlock which consists of Conv2D layer followed by BatchNormalization and Relu. It downsamples input by MaxPooling2D upon request. ConvBlock are allocated sequantially just like VGG16 or AlexNet architecture.

Unlike architecture for image classificaiton, however, FCN does not have fully connected layer. Instead, it has UpsampleBlock which takes input of size 64x64x512, then generates 512x512x1 by Conv2DTranspose.

5.2 U-NET

U-net is basically similar to convolutional autoencoders that have downsampling decoder and upsampling encoder. What makes U-net unique is skip connection from decoder to encoder at each level. This structure looks similar to V-model.

LeftBlock written in the below code has Conv2D followed by BatchNormalization and Relu. It downsamples input by MaxPooling2D upon request. RightBlock upsamples input by Conv2DTranspose and it is concatenated with skip connection from LeftBlock. It is actually simple when plotted in the figure.

Figure of U-net

6. Training and Hyperparameter Tuning

6.1 Split Dataset

80% of image is randomly selected for training.

val_tile_df is later used to analyze validation result.

6.2 Custom Loss Function

Since the label is imbalanced (refer to chapter 4 EDA), simple binary cross entropy would not work well. Instead, this custom_loss function calculate binary cross entropy for positive and negative pixels separately. Because tesorflow's BCE contains reduce_mean function, positive and negative losses can be balanced in this way. It is same as weighted log loss Then, positve loss and negative loss are summed with weight of negative loss. When the weight of negative loss is 2 or 3 , the result was good.

6.3 Image Augmentation

The funciton augment is applied to both image and label, because when image is flipped, label shall follow it. First part of augmentation is horizontal and vertical flip at probability 0.5 respectively. Next is image rotation. The rotation angle is selected from 0, 90, 180 and 270. Finally, the image and the label is zoomed at 0.3 probability. It select uppler left pixel randomly. Then the length is selected. It is clipped and resized to 512x512.

6.4 Custom train loop

The custom train loop train_loop consists of augment fuction, train_step which calculates loss and apply gradients, and some functions to calculate validation results (cal_IoU, predict_probability, val_score). The statement @tf.function before the train_step significantly accelerates training by creating computation graph. Furthermore, to keep the best model, train_loop saves trained model when validation loss is the lowest.

6.5 Hyperparameter Tuning

After running notebooks many times, following hyperparameters were tuned. Only the results are shown here, because demonstrating all the process is not possible within limited calculation time.

General

FCN

U-NET

6.6 Training FCN

This model is previously trained in the notebook to avoid OutOfMemoryError, because training two models in one notebook was not possible. It loads trained results.

import trained model and result

6.7 Training U-NET

This model is previously trained in the notebook to avoid OutOfMemoryError, because training two models in one notebook was not possible. It loads trained results.

import trained result

7. Analysis of Validation Result

7.1 Validation Loss

There are lots of ways to analyse results. First, validation loss of both models are compared. According to the plots, both model learned well but U-NET has slightly smaller loss than FCN.

7.2 Predicted labels

Since both of FCN and U-net trained well, they must be able to predict label with good accuracy. The funciton plot_result visualize an image, its label, and prediction results by FCN and U-net.

Prediction by FCN

Prediction by U-NET

Results

Ten of prediction resutls are shown below. Apparently, both models predict somewhat well, although it is not simple task. The true label is green and the predicted label is red. Predicted label is created by cutting off predicted probability at 0.8. When both are plotted in the same image (refer to Compare (Y:tp)), true positive becomes yellow. Precisions of prediction are varied, some are very well predicted but some are not. The next question would be "Does the result depend on dataset?"

7.3 Average IoU

Finally, the average IoU is calculated: predict label, calculate IoU of an image, then take mean of them. Overall, both FCN and U-NET achieved the target: Average IoU > 0.5 in validation. When it is calculated by datasets/sources, however, both models did not achieve 0.5 in dataset1 source2. The reason could be lack of quantity of dataset1 source2 (refer to the chapter 4 EDA, section 4.1 Basic Information). To figure out the relationship between number of images and validation scores, scatterplot Average IoU vs train image quantity are created. Those are kind of correlated.

Average IoU

Average IoU by dataset/source

7.4 IoU Distribution

Those density plots shows IoU Distribution. First one is overall distribution, and latter ones are by dataset/source. For both models, They are normally distributed.

for dataset1/source1, U-NET has peak greater than 0.6.

For dataset2/source1, both models have peak greather than 0.6.

8. Kaggle Score

8.1 Evaluation Metric

This competition adopted different metric for scoring.

Submissions are evaluated by computing the Average Precision over confidence scores. ... Segmentation is calculated using IoU with a threshold of 0.6.

https://www.kaggle.com/competitions/hubmap-hacking-the-human-vasculature/overview/evaluation

Unlike Average IoU, this metric calculate true positive and false positive for each blood vessel. True positive of a blood vessel means IoU > 0.6. For example, when there are multiple blood vessels closely exist and predicted connected labels are connected, then it counted as false positive.

8.2 How I submitted

Using U-NET or FCN accompanies following additional procedures (step1 - step3). Those who applied Mask-RCNN or YOLO does not need step1 - step3.

Step1: To cutoff low probability in predicted mask at threshold 0.8~0.84 (tuned before every submission by IoU).

Step2: To identify all predicted blood vessels indivisually by cv2.connectedComponents

Step3: To calculate mean probability of each component. It is confidence of prediction required by the competition.

Step4: To encode every predicted label using the provided function encode_binary_mask

Step5: To submit the encoded labels with confidences.

8.3 Kaggle Score

Both models are trained by 95% of given image, and max_epoch is 35, and the models with lowest validation loss are selected. Then kaggle private scores are:

9. Conclusion

This project achieved avrage IoU in validation > 0.5 by both FCN and U-NET. First, EDA discovered imbalance in label and difference of distribution between datasets/source. Based on EDA, custom loss function is developed to increase importance of positive labels which are small in images. Image augmentation consists horizontal&vertical flip, rotation, and random zoom, because those operations do not collapse the context (there must be no definitive direction of tissues). It was effective to privent overfitting when there are no plenty of images (only 1600 images are given).

Both FCN and U-NET took nearly similar computation time in validation: 25~45sec. However, training of FCN was much faster than U-NET. FCN took one epoch for 100sec, whereas U-NET took 210sec, because architectre of FCN is much simpler and number of parameters are be smaller. On the otherhand, U-NET got smaller validation loss at every epoch. Thus, the best architecture depends on circumstances.

What did not work well

Following attemps did not improve results:

Random brightness change is one of typical auguentation, but merely worsened both validation and kaggle score. The reason should be small variation of brightness in dataset.

Ideas for improvement

Chapter 7.3 revealed that validation score was worst in dataset1/source2, likely due to its small number of images. Since four images were randomly selected for a batch of training input (batch size = 4), dataset1/source2 had relatively less chance to be selected. In this case, selecting images by equal probability from datasets/source rectifies this imbalance of training that could improve precision.

References

[1] Fully Convolutional Networks for Semantic Segmentation, Jonathan Long, Evan Shelhamer, Trevor Darrell, UC Berkeley: https://arxiv.org/abs/1411.4038

[2] U-Net: Convolutional Networks for Biomedical Image Segmentation, Olaf Ronneberger, Philipp Fischer, Thomas Brox, Computer Science Department and BIOSS Centre for Biological Signalling Studies, University of Freiburg, Germany: https://arxiv.org/abs/1505.04597